57 research outputs found

    VLSI Design of a Fast Pipelined 8x8 Discrete Cosine Transform

    Get PDF
    This paper presents a Very Large Scale Integrated (VLSI) design and implementation of a fixed-point 8x8 multiplierless Discrete Cosine Transform (DCT) using the ISO/IEC 23002-2 algorithm. The standard DCT algorithm, which is mainly used in image and video compression technology, consists of only adders, subtractors, and shifters, therefore making it efficient for hardware implementation. The VLSI implementation of the algorithm given in this paper further enhances the performance of the transform unit. Furthermore, circuit pipelining has been applied to the base design of the DCT, which significantly improves the performance by reducing the longest path in the non-pipeline design. The DCT has been implemented using semi-custom VLSI design methodology using the TSMC 0.13um process technology. Results show that our DCT designs can run up to around 1.7 Giga pixels/s, which is well above the timing required for real-time ultra-high definition 8K video

    A Low-complexity Complex-valued Activation Function for Fast and Accurate Spectral Domain Convolutional Neural Network

    Get PDF
    Conventional Convolutional Neural Networks (CNNs), which are realized in spatial domain, exhibit high computational complexity. This results in high resource utilization and memory usage and makes them unsuitable for implementation in resource and energy-constrained embedded systems. A promising approach for low-complexity and high-speed solution is to apply CNN modeled in the spectral domain. One of the main challenges in this approach is the design of activation functions. Some of the proposed solutions perform activation functions in spatial domain, necessitating multiple and computationally expensive spatial-spectral domain switching. On the other hand, recent work on spectral activation functions resulted in very computationally intensive solutions. This paper proposes a complex-valued activation function for spectral domain CNNs that only transmits input values that have positive-valued real or imaginary component. This activation function is computationally inexpensive in both forward and backward propagation and provides sufficient nonlinearity that ensures high classification accuracy. We apply this complex-valued activation function in a LeNet-5 architecture and achieve an accuracy gain of up to 7% for MNIST and 6% for Fashion MNIST dataset, while providing up to 79% and 85% faster inference times, respectively, over state-of-the-art activation functions for spectral domain

    Dataflow Program Analysis and Refactoring Techniques for Design Space Exploration: MPEG-4 AVC/H.264 Decoder Implementation Case Study

    Get PDF
    This paper presents a methodology to perform design space exploration of complex signal processing systems implemented using the CAL dataflow language. In the course of space exploration, critical path in dataflow programs is first presented, and then analyzed using a new strategy for computational load reduction. These techniques, together with detecting design bottlenecks, point to the most efficient optimization directions in a complex network. Following these analysis, several new refactoring techniques are introduced and applied on the dataflow program in order to obtain feasible design points in the exploration space. For a MPEG-4 AVC/H.264 decoder software and hardware implementation, the multi-dimensional space can be explored effectively for throughput, resource, and frequency, with real-time decoding range from QCIF to HD resolutions

    Location closeness model for VANETs with integration of 5G

    Get PDF
    Nowadays. 5G is playing a significant role in the efficiency of network security and creating more and faster channels for communication. 5G is evoking industries such as healthcare, education, marketing, transportation, and V2X (Vehicle-to-everything). In addition. 5G considers a new radio access technology that is adding new applications like the Internet of Tilings (IoT). Augmented Reality. Virtual Reality, connected cars, connected people-to-people, smart city, connected homes that are considered using higher bandwidth and low latency. Mainly, this paper is focusing on security challenges faced by the Vehicular ad-hoc network (VANET). VANET faces threats in three different fields: Security, safety, and infotainment, which further have numerous attacks. More precisely, this research conducted an in-depth study and proposed a VANET trust model. Therefore the proposed model deals specifically with the "location closenessb" parameter. Moreover, the trust model integrated with 5G cloud to support greater coverage, effective network density with respect to network infrastructure and IoT as well. Therefore, in this article, an effort has been put forward to implement the model using case studies to validate the trust model based on the "location closeness parameter. The results proved the valid implementation of the model by identifying the trusted communication between the vehicles

    HEVC 2D-DCT architectures comparison for FPGA and ASIC implementations

    Get PDF
    This paper compares ASIC and FPGA implementations of two commonly used architectures for 2-dimensional discrete cosine transform (DCT), the parallel and folded architectures. The DCT has been designed for sizes 4x4, 8x8, and 16x16, and implemented on Silterra 180nm ASIC and Xilinx Kintex Ultrascale FPGA. The objective is to determine suitable low energy architectures to be used as their characteristics greatly differ in terms of cells usage, placement and routing methods on these platforms. The parallel and folded DCT architectures for all three sizes have been designed using Verilog HDL, including the basic serializer-deserializer input and output. Results show that for large size transform of 16x16, ASIC parallel architecture results in roughly 30% less energy compared to folded architecture. As for FPGAs, folded architecture results in roughly 34% less energy compared to parallel architecture. In terms of overall energy consumption between 180nm ASIC and Xilinx Ultrascale, ASIC implementation results in about 58% less energy compared to the FPGA

    Design space exploration strategies for FPGA implementation of signal processing systems using CAL dataflow program

    Get PDF
    This paper presents some strategies for design space exploration of FPGA-based signal processing systems that are specified using the CAL dataflow language. The actor- oriented, high-level of abstraction provided by CAL allows flexible exploration and consequently results in a wide range of feasible design implementations. We have applied and ex- tended the existing techniques for refactoring and pipelining actors and actions by means of critical path analysis, and in- troduced some new buffering techniques based on heuristics. The combinations of these techniques have been applied on the CAL specification of the MPEG-4 video decoder, and synthesized to HDL for evaluation in the design implementa- tion space. Results show that using our configuration for the exploration of 48 design points, a throughput range of roughly 8x has been achieved, for slice, block RAM, frequency, and latency range of 1.3x, 2.5x, 2.5x, and 2.9x respectively

    DTAPO: Dynamic thermal-aware performance optimization for dark silicon many-core systems

    Get PDF
    Future many-core systems need to handle high power density and chip temperature effectively. Some cores in many-core systems need to be turned off or ‘dark’ to manage chip power and thermal density. This phenomenon is also known as the dark silicon problem. This problem prevents many-core systems from utilizing and gaining improved performance from a large number of processing cores. This paper presents a dynamic thermal-aware performance optimization of dark silicon many-core systems (DTaPO) technique for optimizing dark silicon a many-core system performance under temperature constraint. The proposed technique utilizes both task migration and dynamic voltage frequency scaling (DVFS) for optimizing the performance of a many-core system while keeping system temperature in a safe operating limit. Task migration puts hot cores in low-power states and moves tasks to cooler dark cores to aggressively reduce chip temperature while maintaining high overall system performance. To reduce task migration overhead due to cold start, the source core (i.e., active core) keeps its L2 cache content during the initial migration phase. The destination core (i.e., dark core) can access it to reduce the impact of cold start misses. Moreover, the proposed technique limits tasks migration among cores that share the last level cache (LLC). In the case of major thermal violation and no cooler cores being available, DVFS is used to reduce the hot cores temperature gradually by reducing their frequency. Experimental results for different threshold temperatures show that DTaPO can keep the average system temperature below the thermal limit. Affirmatively, the execution time penalty is reduced by up to 18% compared with using only DVFS for all thermal thresholds. Moreover, the average peak temperature is reduced by up to 10.8◦ C. In addition, the experimental results show that DTaPO improves the system’s performance by up to 80% compared to optimal sprinting patterns (OSP) and reduces the temperature by up to 13.6◦ C

    Lightweight Trust Model with Machine Learning scheme for secure privacy in VANET

    Get PDF
    A vehicular ad hoc network (VANETs) is transforming public transport into a safer wireless network, increasing its safety and efficiency. The VANET consists of several nodes which include RSU (Roadside Units), vehicles, traffic signals, and other wireless communication devices that are communicating sensitive information in a network. Nevertheless, security threats are increasing day by day because of dependency on network infrastructure, dynamic nature, and control technologies used in VANET. The security threats could be addressed widely by using machine learning and artificial intelligence on the road transport nodes. In this paper, a comparison of trust and cryptography was presented based on applications and security requirements of VANET
    corecore